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Background Clinical outcome of patients with triple-negative breast cancer (TNBC) is highly variable. This study aims to identify 
and validate a prognostic protein signature forTNBC patients to reduce unnecessary adjuvant systemic therapy. 

Methods Frozen primary tumors were collected from 126 lymph node-negative and adjuvant therapy-naiveTNBC patients. 

These samples were used for global proteome profiling in two series: an in-house training (n = 63) and a multi- 
center test (n = 63) set. Patients who remained free of distant metastasis for a minimum of 5 years after surgery 
were defined as having good prognosis. Cox regression analysis was performed to develop a prognostic signa- 
ture, which was independently validated. All statistical tests were two-sided. 

Results An 11-protein signature was developed in the training set (median follow-up for good-prognosis 
patients = 117 months) and subsequently validated in the test set (median follow-up for good-prognosis 
patients = 108 months) showing 89.5% sensitivity (95% confidence interval [CI] = 69.2% to 98.1%), 70.5% speci- 
ficity (95% CI = 61.7% to 74.2%), 56.7% positive predictive value (95% CI = 43.8% to 62.1%), and 93.9% negative 
predictive value (95% CI = 82.3% to 98.9%) for poor-prognosis patients. The predicted poor-prognosis patients had 
higher risk to develop distant metastasis than the predicted good-prognosis patients in univariate (hazard ratio 
[HR] = 13.15; 95% CI = 3.03 to 57.07; P= .001) and multivariable (HR = 12.45; 95% CI = 2.67 to 58.11; P= .001 ) analy- 
sis. Furthermore, the predicted poor-prognosis group had statistically significantly more breast cancer-specific 
mortality. Using our signature as guidance, more than 60% of patients would have been exempted from unneces- 
sary adjuvant chemotherapy compared with conventional prognostic guidelines. 

Conclusions We report the first validated proteomic signature to assess the natural course of clinicalTNBC. 

JNCI J Natl Cancer Inst (2014) 106(2): djt376 doi:10.1093/jnci/djt376 



Triple-negative breast cancer (TNBC) is one of the most aggres- 
sive breast cancer subtypes. To date, there is no clinically avail- 
able targeted therapy for patients diagnosed with TNBC. Current 
guidelines for breast cancer treatment recommend that the major- 
ity of lymph node-negative (LNN) breast cancer patients, includ- 
ing TNBC patients, be treated with adjuvant chemotherapy (1) . 
Eventually approximately 30% of LNN TNBC patients develop 
distant metastasis (2) and could thus potentially benefit from adju- 
vant chemotherapy; this indicates that the majority of patients are 
currently being overtreated. The lack of highly sensitive and spe- 
cific prognostic markers is a major obstacle to predicting prognosis 
of TNBC patients. Moreover, there is an urgent need to identify 
potentially useful targets for therapy. Some commonly applied 
approaches for biomarker discovery, such as gene expression profil- 
ing, have not yet succeeded in identifying highly sensitive and spe- 
cific markers related to disease progression of TNBC (3). Although 



a few publications have claimed to identify gene signatures that 
predict prognosis of TNBC (4-8), the reported signatures have 
limited clinical value because of their experimental design and lack 
of sufficient sensitivity and specificity. Therefore, it is desirable 
to identify a highly sensitive and specific prognostic signature for 
clinical application. 

Quantitative proteome analysis using nanoscale liquid chro- 
matography and tandem mass spectrometry (nLC-MS/MS) may 
complement these efforts because nLC-MS/MS quantitatively 
profiles a large portion of the human proteome (9). Some protein 
markers predicting prognosis (10) and therapy resistance (11) of 
breast cancer have already been identified using nLC-MS/MS- 
based comparative proteome profiling. In this study, we identified 
and independently validated a protein signature to predict 5-year 
metastasis-free survival of LNN patients who did not receive 
systemic adjuvant therapy, which would allow for the selection 
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of TNBC patients that can be spared the toxicity of adjuvant 
chemotherapy. 

Methods 

Study Design 

Patient specimens used to develop and validate a prognostic signa- 
ture were collected in five European institutes between 1985 and 
2005. Triple negativity of tumor tissues was confirmed based on 
mRNA expression using real-time quantitative polymerase chain 
reaction with a reported criterion of estrogen receptor (ESR1) less 
than 0.2, progesterone receptor (PGR) less than 0.1, and human epi- 
dermal growth factor receptor (HER2/ERBB2) genes BB2 less than 
18 (12,13). With this criterion, we identified 271 and 61 TNBC 
tumor tissues from local tissue bank and European Organization for 
Research and Treatment of Cancer (EORTC) collaborators, respec- 
tively. Tissues used in this study were restricted to LNN patients 
who had not received systemic adjuvant therapy. Patients who devel- 
oped distant metastasis as first event within 5 years after removal of 
the primary tumor were considered to have poor prognosis, whereas 
patients who remained free of distant metastasis for 5 years were 
defined as having good prognosis. Patients who had bilateral breast 
cancer were excluded. In addition, inclusion for microdissection and 
mass spectrometry (MS) profiling depended on sufficient invasive 
tumor area for microdissection and morphological quality. With the 
above-mentioned criteria, samples were subsequently rejected on 
the basis of clinical reasons [patients were diagnosed with positive 
lymph-nodes (92), received adjuvant chemotherapy (3), had insuf- 
ficient clinical follow-up (19), and developed local relapse before 
distant metastasis (26)] and technical reasons [insufficient tumor 



tissues (33), indistinguishable morphology, and insufficient tumor 
area for microdissection (32)]. As a consequence, 63 and 64 tumors 
from training and test sets were prepared for nLC-MS/MS profiling, 
respectively. One sample was not successfully measured because of 
machinery failure. Clinical characteristics of the included (n = 126) 
and excluded (n = 206) subjects are summarized in Supplementary 
Table 1 (available online). nLC-MS/MS data from 63 samples were 
used for development of the protein signature (training set), and data 
from the other 63 samples were used for independent validation (test 
set) (Table 1; Supplementary Table 2, available online). This study 
was approved by the local institutional medical ethics committee 
(MEC 02.953), and wherever possible we adhered to the Reporting 
Recommendations for Tumor Marker Prognostic Studies (14). 

Experimental Procedures 

The dedicated procedure of sample preparation was performed (15) 
(Supplementary Methods, available online). Tissue samples were cut 
into 8-um frozen sections and then microdissected to obtain approx- 
imately 4000 breast cancer epithelial cells. Proteins were extracted 
from microdissected cells, and this was followed by denaturation, 
reduction, and alkylation. Protein samples were digested at 37°C for 
4 hours using MS-grade trypsin (Promega, Madison, WI) at a 1 :4 
(enzyme/protein) ratio and then acidified for further analysis. 

Global proteome profiles of the TNBC samples were 
recorded on an nLC hyphenated LTQ-Orbitrap-XL MS system 
(ThermoElectron, Bremen, Germany) (Supplementary Methods, 
available online). Peptide mixtures were separated on the nLC sys- 
tem with a 3 -hour binary gradient (mobile phase A: water; mobile 
phase B: acetonitrile) in a 3-u.m C 18 silica-packed 50-cm capillary 
column with 75-u.m inner diameter. Mass spectra were acquired 



Table 1. Clinical and pathological characteristics of patients and their tumors* 



Variables 



Training set 
EMC (n = 63) 



Test set (11-protein signature) 



Multicenter (n = 63) 



EMC (n = 30) 



EORTC (n = 33) 



Age, y 
Mean (SD) 
<40 
41-55 
56-70 
>70 

Menopausal status 

Premenopausal 

Postmenopausal 
Tumor size, cm 

Mean (SD)t 

pT1, < 2cm 

pT2-4, >2cm 

Unknown 
Grade 

Grade 1 

Grade 2 

Grade 3 

Unknown 
Metastasis within 5 years 

Yes 

No 



51 (14) 
13 (20.6) 
25 (39.7) 
19 (30.2) 
6 (9.5) 

34 (54.0) 
29 (46.0) 

2.9 (1.5) 
22 (34.9) 
39 (61.9) 
2 (3.2) 

0 (0.0) 
4 (6.3) 
44 (69.8) 
15 (23.8) 

25 (39.7) 
38 (60.3) 



56 (13) 
9 (14.3) 
23 (36.5) 
22 (34.9) 
9 (14.3) 

26 (41.3) 
37 (58.7) 

2.6 (1.1) 
22 (34.9) 
35 (55.6) 
6 (9.5) 

2 (3.2) 
12 (19.0) 

43 (68.3) 
6 (9.5) 

19 (30.2) 

44 (69.8) 



54(13) 
6 (20.0) 
12 (40.0) 

8 (26.7) 
4(13.3) 

16 (53.3) 
14 (46.7) 

2.9 (1.2) 
6 (20.0) 

20 (66.7) 
4(13.3) 

2 (6.7) 
2 (6.7) 

21 (70.0) 
5 (16.7) 

9 (30.0) 
21 (70.0) 



59 (13) 
3 (9.1) 
11 (33.3) 

14 (42.4) 
5 (15.2) 

10 (30.3) 
23 (69.7) 

2.3 (0.9) 
16 (48.5) 

15 (45.5) 
2 (6.1) 

0 (0.0) 
10 (30.3) 

22 (66.7) 

1 (3.0) 

10 (30.3) 

23 (69.7) 



* Data are No. (%) unless otherwise stated. EMC = Erasmus Medical Center; EORTC = European Organization for Research and Treatment of Cancer; SD = standard 
deviation. 

t Samples with recorded tumor size were used for calculation. 
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over a mass-to-charge ratio (m/z) range of 400 to 1800 at a resolv- 
ing power of 30000 at 400 m/z. The five most intensive parent 
ions from the full scan were isolated and fragmented by collisional 
activated dissociation in the linear ion trap. Dynamic exclusion was 
used to increase the number of parent ions selected for fragmen- 
tation. Recorded raw nLC-MS/MS data have been submitted to 
ProteomeXchange (accession number: PXD000260). 

Statistical Analysis 

The recorded MS spectra from the training and test set were sepa- 
rately analyzed in MaxQuant Software (free-ware available from 
www.maxquant.org, version 1.1.1.36) (16). To identify a prognostic 
protein signature, Cox regression analysis was performed to associ- 
ate protein abundance with metastasis-free survival time of patients 



in the training set. Cox regression coefficients and corresponding P 
values were determined for all tested proteins. The assumption of 
proportionality was verified by a test based on the Schoenfeld resid- 
uals. Prognostic proteins were selected with a P value cutoff of less 
than .01. Weighted protein abundance of the prognostic proteins 
was computed by multiplication of their protein abundance and cor- 
responding Cox regression coefficients. A relapse score was calcu- 
lated by summation of weighted protein abundance of all protein(s) 
in a given predictive model, followed by z score transformation 
of summed score. Multiple predictive models were constructed 
from the training set by starting with the protein with the lowest 
P value and stepwise adding one more protein in the next model. 
Efficiency of these models was then assessed by summation of three 
parameters: 1) area under the curve (AUC) of receiver operating 
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93 EMC samples 
(EMC tissue bank) 



34 EORTC samples 
(Recruited from EORTC consortium, 2011) 



Randomized and divided 
into two parts 



30 EMC samples 



Training set (EMC cohort) 
63 samples 



Test set (Multi-center cohort) 
64 samples 



1 . LCM of tumor area from each sample; 

2. Protein extraction & 4-h trypsin digestion; 

3. 3-h gradient nLC-MS/MS profiling. 



Lost one sample because 
of technical failure 



Global proteome profiling 
(2010-05/06) 



Global proteome profiling 
(2012-06/07) 



I 



MaxQuant analysis 



I 



MaxQuant analysis 



Protein identification: 3660 



Protein identification: 4385 



Data filtering 



I 



Training set (981 proteins): 

63 patients (25 poor & 38 good 
prognosis) 



Test set: 

63 patients (19 poor & 44 good 
prognosis) 



Cox regression analysis: 

23 proteins (P< .01) 



Independent validation of 1 1 - 
protein signature 



1 



Multivariable linear regression 
model (23 models) 



I 

11-protein signature [ ■ 



Figure 1. Flow chart of experimental design for the development and validation of an 11-protein signature. EMC = Erasmus University Medical 
Center; EORTC = European Organization for Research andTreatment of Cancer; LCM = laser capture microdissection; nLC-MS/MS = nanoscale liquid 
chromatography and tandem mass spectrometry. 
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characteristic (ROC) curve (17), 2) Youden's index at 100% sensitiv- 
ity (sensitivity + specificity - 1), and 3) reversed model size (1 - n/n, 
where n ; is the number of proteins in a defined model, and n is the 
total number of prognostic proteins). The model with the highest 
efficiency was considered as the best prognostic signature. 

The cutoff of the prognostic protein signature was determined 
from the ROC curve of the training set to ensure 1 00% sensitivity and 
highest specificity. The protein signature was validated in the test set 
at the determined cutoff. Kaplan-Meier survival curves and log-rank 
test were performed to evaluate the differences in the time to distant 
metastasis of predicted good and poor prognosis groups. Univariate 
and multivariable analyses with Cox proportional hazard model were 
used to assess the prognostic value of the protein signature with and 
without consideration of the individual clinical prognostic variables. 

Pathway analysis was performed to interpret biological function 
of signature proteins using gene set enrichment analysis (18). The 



Supplementary Methods (available online) describe detailed infor- 
mation about MaxQuant settings, statistical software packages, and 
pathway analysis. 

Results 

The training and test samples were profiled by nLC-MS/MS in 2010 
and 2012, respectively (Figure 1). The baseline clinical features of 
patients were similar between the Erasmus University Medical Center 
(EMC) training and multicenter test set, although patients were 
slightly older and more patients were classified with grade 2 tumors 
in the test set (Table 1). The median follow-ups for the good-progno- 
sis patients in the training and test set were 117 (range = 61-257) and 
108 (range = 61-234) months, respectively. Patients were recruited in 
five EORTC institutes between 1985 and 2005. Clinical data used for 
data analysis were updated until October 2012. 




2.31 



0.00 0.50 1.00 1.50 2.00 
Model efficiency 



2.50 



AUC of ROC curve 



Specificity at 1 00% sensitivity 



Reversed model size 



Figure 2. Selection of the best predictive signature from 23 prognostic 
models developed in the training set.The different models were created 
from 23 prognostic proteins (Cox regression analysis: P< .01), starting 
with the protein with the lowest Pvalue and gradually adding one more 
protein at a time, thereby constructing 23 different prognostic mod- 
els. Model efficiency was considered by three aspects: 1) area under 



the curve (AUC) of the receiver operating characteristic (ROC) curve, 
2) specificity at 100% sensitivity, and 3) reversed model size (1 - 1/n, 
where n is number of used protein[s] for the model). The model with 
the highest model efficiency was considered as the best model, result- 
ing in selection of the 11-protein signature (model efficiency = 2.31) for 
validation. 
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Development of a Proteomic Prognostic Signature for 
TNBC Patients 

Proteome profiles for training and test samples were independently 
generated by nLC-MS/MS (see Supplementary Methods, avail- 
able online). A stringent filtering procedure resulted in a total of 
981 proteins for development of a prognostic signature (Figure 1; 
Supplementary Figure 1, available online). Good reproducibility 
and quantitative precision were observed from quantified proteins 
in replicate samples by performing the same filtering criteria (see 
Supplementary Methods, available online). 

The 981 proteins were used to develop a prognostic signature 
in the training set. Using univariate Cox analysis, a panel of 23 pro- 
teins was statistically significantly associated with metastasis-free 
survival of patients (P < .01) (Figure 1; Supplementary Figure 1, 
available online). Twenty-three prognostic models were derived 
from these 23 proteins using a multivariable linear regression 
model, from which an 11 -protein model performed fairly compa- 
rably with the models with 1 5 or more proteins based on an AUC 
of ROC curve and specificity at 100% sensitivity to predict poor- 
prognosis patients. By further considering model size, we calculated 
efficiency for all 23 models, and the 11 -protein model showed the 



highest model efficiency at 2.31 (Figure 2). Detailed information 
for these 1 1 proteins is listed in Table 2 and Supplementary Table 3 
(available online). Of these 11 proteins, 10 proteins were upreg- 
ulated (CMPK1, AIFM1, FTH1, EML4, GANAB, CTNNA1, 
AP1G1, STX12, AP1M1, and CAPZB), whereas one protein was 
downregulated (MTHFD1) in good-prognosis patients in the 
training set. The ROC curve derived from the 11 -protein sig- 
nature showed good sensitivity and specificity with AUC of 0.95 
(Figure 3A). A cutoff was selected with a relapse score of zero to 
identify good-prognosis patients (negative score) at which maximal 
specificity was reached at 100% sensitivity (Figure 3A, green dot). 

Validation of the 11-Protein Signature in a Multicenter 
Cohort of TNBC Patients 

The 11 -protein signature was validated on the test cohort, result- 
ing in an ROC curve with AUC of 0.83 (Figure 3B). Using the pre- 
determined cutoff (relapse score = 0), patients predicted as having 
a poor prognosis (positive score) had much worse 5 -year metasta- 
sis-free survival (log-rank P < .001) (Figure 3C) and breast can- 
cer-related survival (log-rank P < .001) (Figure 3D) than predicted 
good-prognosis patients. In the predicted good-prognosis group, 



Table 2. Eleven-signature proteins and their prognostic information in training set 








Subcellular 


Cox coefficient 




Protein ID 


Gene name 


Protein description 


localization* 


(Hazard ratio)t 


P(FDR)* 


P30085 


CMPK1 


UMP-CMP kinase 


Nucleus/cytoplasm 


-0.587 (0.556) 


.00 (0.123) 


095831 


AIFM1 


Apoptosis-inducing 
factor 1, mitochondrial 


Mitochondrion/nucleus/ 
cytoplasm 


-0.860 (0.423) 


.001 (0.199) 


P02794 


FTH1 


Ferritin heavy chain 


Cytoplasm/intracellular 
matrix 


-0.400 (0.670) 


.001 (0.199) 


P11586 


MTHFD1 


C-1-tetrahydrofolate 
synthase, cytoplasmic 


Cytoplasm 


1.178 (3.247) 


.001 (0.199) 


Q9HC35 


EML4 


Echinoderm 
microtubule- 
associated protein- 
like 4 


Cytoplasm/ 
cytoskeleton/ 
microtubule 


-0.576 (0.562) 


.001 (0.267) 


Q14697 


GANAB 


Neutral alpha- 
glucosidase AB 


Endoplasmic reticulum/ 
Golgi apparatus/ 
melanosome 


-1.050 (0.350) 


.002 (0.312) 


P35221 


CTNNA1 


Catenin alpha-1 


Cytoplasm/ 
cytoskeleton/ 
cell junction/Cell 
membrane 


-1.115 (0.328) 


.002 (0.312) 


043747 


AP1G1 


AP-1 complex subunit 
gamma-1 


Golgi apparatus/ 
cytoplasmic vesicle/ 
clathrin-coated vesicle 
membrane 


-0.997 (0.369) 


.003 (0.332) 


Q86Y82 


STX12 


Syntaxin-12 


Endosome membrane/ 
Golgi apparatus 
membrane 


-0.675 (0.509) 


.003 (0.332) 


Q9BXS5 


AP1M1 


AP-1 complex subunit 
mu-1 


Golgi apparatus/ 
cytoplasmic vesicle/ 
clathrin-coated vesicle 
membrane 


-0.846 (0.429) 


.004 (0.398) 


P47756 


CAPZB 


F-actin-capping protein 
subunit beta 


Cytoplasm/cytoskeleton 


-0.916 (0.400) 


.005 (0.398) 



* Adapted from UniProt Knowledgebase. 

t Cox coefficients were calculated by natural logarithm (In) of hazard ratios. 

t P values were computed by Cox regression analysis, and the corresponding false discovery rate (FDR) of the proteins is reported in parentheses. 
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31 of 3 3 patients did not develop distant metastasis (negative pre- 
dictive value = 93.9%; 95% confidence interval [CI] = 82.3% to 
98.9%) (Figure 3, C and E). In contrast, 17 of 30 patients developed 
distant metastasis in the predicted poor-prognosis group (positive 
predictive value =56.7%; 95% CI = 43.8% to 62.1%) (Figure 3, C 
and E). Overall, our signature results in a sensitivity of 89.5% (95% 



CI = 69.2% to 98.1%) and specificity of 70.5% (95% CI = 61.7% 
to 74.2%) to predict poor-prognosis patients in the test set. In 
univariate Cox analyses, the poor-prognosis patients also have 
statistically significantly higher risk of developing distant metasta- 
ses (hazard ratio [HR] = 13.15; 95% CI = 3.03 to 57.07; P = .001) 
(Figure 3C; Table 3) and breast cancer-related death (HR = 22.78; 
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Figure 3. Development and multicenter validation of the 11-protein sig- 
nature. A) Receiver operating characteristic (ROC) curve of the 11-pro- 
tein signature in the training set (red solid line: area under the curve 
[AUC]). A cutoff was chosen to ensure maximal sensitivity to identify 
the good-prognosis patients with the highest possible specificity in the 
training set (green dot). Other models developed from 23 prognostic 
proteins were also plotted as ROC curves (black dashed lines). B) ROC 



curve of the 11-protein signature in the test set. Kaplan-Meier analysis 
shows that the 11-protein signature is strongly predictive of metastasis- 
free survival (C) and breast cancer (BC)-related survival (D) CI = confi- 
dence interval; HR = hazard ratio. E) Waterfall plot stratifies two groups 
of triple-negative breast cancer patients with different predicted prog- 
nosis in the test set. NPV = negative predictive value; PPV = positive 
predictive value. All statistical tests were two-sided. 



6 of 10 Article | JNCI 



Vol. 106, Issue 2 | djt376 | February 5, 2014 



co 
o 



c 



o 

Q. 



E 



CO 



'uj 

_>■ 
c 



T3 



£ 
CO 



m CD oo 

O CO LO 

^ Csj C\i 

O O O 

CD CD -^r 

CD O CO 

o o o 



= 

ro 

a) 
+j 

.2 

ro 

'E 
3 



ro 
= 
ro 

10 



in 
'uj 
> 



c 



si 



CD in 
CD 



i- o 
o o 



cni co ro 

O ^ CO 



CD O S- 
CD CD ^ 

o d o 



ro c 
Cl t 
O o 
£ cm 



cn > 
> o 



- °3 
cn 

3 E 

ro o 



LO CO CD 

^ ^ in 

O CM 

* — CM CO 

o o o 

O CD O 

LO CO o 

ci d 



CO 
CO o 
CO CO o 



— CO 
LO O 

o ~; 



O CD 
CM C\i 



CD CD LO CO CO 
LO CM ^ ^ CO 



cm r~. ^ o 

«^ CM LO ^ CO o 



CM Lfl n 

CM LO o 

csi od r< 

<— i- LO 

o o o 

^ co 

CD co o 



A 



CD S M 

~o „ c 

ro CO 3 

D> CU CD 

t "D "O 

O ro ro 

E (D (3 



95% CI = 3.00 to 173.08; P = .003) (Figure 3D; Table 3) than those 
predicted to have good prognosis. The 11 -protein signature was 
also prognostic in subgroups of patients with different menopau- 
sal status (Figure 4, A and B) and tumor size (Figure 4, C and D). 
Furthermore, the 11 -protein signature was a strong independent 
prognostic factor to predict risk of distant metastasis (HR = 12.45; 
95% CI = 2.67 to 58.11; P = .001) and of breast cancer-related 
death (HR = 36.08; 95% CI = 4.00 to 325.67; P= .001) in multivari- 
able Cox regression model after correction for the contribution of 
traditional prognostic factors (Table 3). No other prognostic fac- 
tors, such as age, menopausal status, tumor size, or tumor grade of 
the patients, were statistically significantly associated with metasta- 
sis-free survival in univariate and multivariable analyses (Table 3). 

Currently, there is no specific clinical guideline of recommend 
treatment for TNBC patients. Two clinical consensus criteria, St. 
Gallen (19) and National Institutes of Health (NIH) (20), are often 
applied to guide treatment of breast cancer cases. In our test set, 
91% (n = 40 of 44) and 95% (n = 38 of 40) of the good-prognosis 
patients would be classified as high risk and therefore would be 
guided to receive apparently unnecessary adjuvant chemotherapy 
using St. Gallen and NIH criteria, respectively (Table 4). On the 
other hand, only 30% (n = 13 of 44) of the good-prognosis patients 
would be classified for adjuvant chemotherapy using the 11 -pro- 
tein signature (Table 4). Therefore, more than 60% of patients in 
the test set would have been exempted from unnecessary adjuvant 
chemotherapy using the 1 1 -protein signature as guidance, com- 
pared with St. Gallen and NIH criteria. 

Pathway Analysis of Prognostic Signature 

The function of the 11 signature proteins related to progression 
of TNBC was interpreted in silico by gene set enrichment analy- 
sis. To overcome low identification rate of our proteomic data (see 
Supplementary Methods, available online), we used gene expres- 
sion data from 63 in-house available TNBC samples, of which 
47 samples were also included in our global proteome profiling. 
An overall statistically significant correlation between transcrip- 
tome and proteome in the 47 samples indicated the validity of 
using gene expression to interpret the molecular functions related 
to TNBC progression and expression of signature proteins (see 
Supplementary Data, available online). In total, ten proteins were 
matched with their coding genes. Biological pathways related to 
good prognosis of the TNBC patients were mainly associated with 
immune response (eg, modulation of cytokines, antigen processing 
and presentation, and activation of T cells, B cells, and natural killer 
cells) and cell death (eg, ceramide signaling, tumor necrosis fac- 
tor-mediated apoptosis, FAS-FAS ligand-mediated apoptosis, and 
caspase cascade). Nine of 10 genes were associated with good prog- 
nosis of TNBC patients (CMPK1,AIFM1, FTH1, EML4, GANAB, 
CTNNA1,AP1G1, STX12, and CAPZB), of which FTH1, GANAB, 
and STX12 were associated with 58, 18, and three enriched path- 
ways related to immune response or cell death with recommended 
false discovery rate less than 0.25 (Supplementary Table 4, avail- 
able online). On the other hand, cell metabolism (eg, metabolism of 
nucleotides and noncoding RNA) and transport of macromolecules 
(eg, transport of mature transcripts and ribonucleoproteins and 
export of proteins) were key pathways related to poor prognosis 
of TNBC patients. MTHFD1 was the only protein upregulated 
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Figure 4. Kaplan-Meier analysis of metastasis-free survival in sub- 
groups of triple-negative breast cancer patients in the test set. A) 
Premenopausal patients. B) Postmenopausal patients. Hazard ratio 
(HR) and 95% confidence interval (CI) could not be computed because 
of the absence of metastatic events in one of the tested groups. 
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in TNBC patients with poor prognosis and was associated with 
two metabolic pathways (metabolism of nucleotides and noncod- 
ing RNA) with false discovery rate less than 0.25 (Supplementary 
Table 4, available online). 

Discussion 

Prognosis of LNN TNBC patients is extremely heterogeneous 
and often not associated with conventional prognostic parameters 
(patient age, tumor size, and tumor grade) (21), concordant with 
this study. Approximately 30% of TNBC patients eventually expe- 
rience distant relapse (2). Therefore, a substantial proportion of 
patients are being overtreated with systemic adjuvant therapy. In 
this study, we identified and independently validated a clinically 
relevant prognostic 11 -protein signature for TNBC, which could 
be used to reduce the number of patients that would receive unnec- 
essary adjuvant chemotherapy. 

To translate the 11 -protein signature into a clinically useful 
assay, two future steps are essential. First, a quantitative, targeted, 



Table 4. Comparison of the 11-protein signature with currently 
applied clinical consensus criteria on treatment of breast cancer 



Method 


Patients guided to receive adjuvant 
chemotherapy in the test set 


Poor prognosis 


Good prognosis 


St. Gallen* 


18/18(100%)* 


40/44 (91%) 


NIHt 


14/17 (82%)* 


38/40 (95%)* 


11-protein signature 


17/19 (89%) 


13/44 (30%) § 



* St. Gallen consensus criteria: tumor >2cm, ESR1 negative, grade 2-3, patient 
aged <35 years (one of these criteria). 

t National Institutes of Health (NIH) guideline: tumor >1 cm. 

* Patients with missing clinical information were excluded from these analyses. 
§ A statistically significant improvement. 

and ready-to-use assay needs to be developed for absolute quantifi- 
cation of the signature proteins. The most promising candidate for 
such an assay would be a mass spectrometry-based selected reac- 
tion monitoring assay (22). Second, such a targeted assay can then 
be incorporated into future clinical trials and subsequently used 
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for clinical decision making for treatment options. Furthermore, 
some of the signature proteins may serve as potential targets for 
novel therapies. 

Some signature proteins have been reported to be prognostic in 
breast cancer or other cancers. The prognostic role of these pro- 
teins is summarized in three categories. First, expression of some 
proteins is directly linked to disease progression. For instance, 
decreased levels of FTH1 protein have been associated with 
lymph node metastasis in colorectal cancer (23). Similarly, abnor- 
mal expression of CTNNA1 protein has been related with poor 
patient survival in invasive breast cancer (24). Also, association has 
been observed between increased expression of GANAB protein 
and favorable prognosis in head and neck cancer (25). Second, 
genetic polymorphisms of some proteins showed important prog- 
nostic values. Single nucleotide polymorphisms of CMPK1 have 
been associated with prognosis of non-small cell lung cancer (26) 
and pancreatic cancer (27) patients treated with gemcitabine-based 
chemotherapeutics. A correlation has also been reported between 
a 1958AA single nucleotide polymorphism of MTHFD1 gene and 
poor clinical outcome of premenopausal breast cancer patients 
(28). Moreover, genetic mutations of certain protein-coding genes 
may also be prognostic. EML-ALK fusion gene has been suggested 
to be associated with relatively favorable prognosis of non-small 
cell lung cancer patients (29). We speculate that there may be a 
correlation between expression of these proteins and their genetic 
variance, which needs to be further investigated. 

Pathway analysis showed that immune response, cell death, 
cell metabolism, and transport of macromolecules are the major 
underlying molecular mechanisms of the signature proteins 
related to prognosis ofTNBC patients. These findings are in con- 
cordance with previous observations. Lehmann and colleagues 
identified an immunomodulatory subtype that was associated with 
favorable clinical outcome (30). Rody and colleagues identified a 
signature of high B-cell and low IL-8 metagenes that predicted 
good prognosis of TNBC patients (5), and Yau and coworkers 
reported a 14-gene signature linked to immune/inflammatory 
cytokine regulation in which the majority of genes were associated 
with good prognosis ofTNBC patients (8). In our prognostic sig- 
nature, three proteins (FTH1, GANAB, and STX12) are associ- 
ated with immunomodulation and cell death-associated pathways. 
FTH1 is an iron storage protein and indirectly regulates the ratio 
of CD4 + T cells and CD8 + T cells by altering iron distribution 

(31) . Increased CD8 + T cells may help to eliminate tumor cells 

(32) , which would be favorable for survival of TNBC patients. 
Molecular functions of GANAB and STX12 related to immune 
response and cell death have not been well studied. Therefore, fur- 
ther studies are required to understand the functions of these pro- 
teins in disease progression. On the other hand, it is known that 
the luminal androgen receptor subtype of TNBC, with aberrant 
alteration of cell metabolism, is related to adverse clinical outcome 
ofTNBC patients (30). In our signature, MTHFD1 is associated 
with poor prognosis and is involved in folate metabolism. It has 
been suggested that increased concentration of folate resulted 
in a dose-dependent downregulation of tumor suppressor genes 
in breast cancer cell lines due to increased DNA methylation of 
their promoter region (33), which may indicate the importance of 
MTHFD1 in TNBC progression. 



Our study has some limitations. First, microdissection-based 
sample preparation is tedious and may introduce additional biases 
between diagnostic individuals and laboratories. Second, the 
nLC-MS/MS-based platform is difficult to apply in routine clini- 
cal practice. Therefore, our future work will focus on developing 
simple sampling strategies and high-throughput selected reaction 
monitoring assay to reliably measure the 1 1 -protein signature. 

In summary, we have developed an 1 1 -protein signature to predict 
TNBC patients who develop a distant metastasis with high sensitiv- 
ity, specificity, positive predictive value, and negative predictive value 
(82.3% to 98.9%). Therefore, our signature could aid in clinical prac- 
tice to avoid unnecessary treatment with adjuvant chemotherapy of 
LNNTNBC patients. Future prospective clinical trials are needed to 
further consolidate the validity of the 1 1 -protein signature. 
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